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Speakers unconsciously tend to mimic their interlocutor's speech during communicative 
interaction. This study aims at examining the neural correlates of phonetic convergence 
and deliberate imitation, in order to explore whether imitation of phonetic features, 
deliberate, or unconscious, might reflect a sensory-motor recalibration process. Sixteen 
participants listened to vowels with pitch varying around the average pitch of their 
own voice, and then produced the identified vowels, while their speech was recorded 
and their brain activity was imaged using fMRI. Three degrees and types of imitation 
were compared (unconscious, deliberate, and inhibited) using a go-nogo paradigm, which 
enabled the comparison of brain activations during the whole imitation process, its active 
perception step, and its production. Speakers followed the pitch of voices they were 
exposed to, even unconsciously, without being instructed to do so. After being informed 
about this phenomenon, 14 participants were able to inhibit it, at least partially. The results 
of whole brain and ROI analyses support the fact that both deliberate and unconscious 
imitations are based on similar neural mechanisms and networks, involving regions of the 
dorsal stream, during both perception and production steps of the imitation process. While 
no significant difference in brain activation was found between unconscious and deliberate 
imitations, the degree of imitation, however, appears to be determined by processes 
occurring during the perception step. Four regions of the dorsal stream: bilateral auditory 
cortex, bilateral supramarginal gyrus (SMG), and left Wernicke's area, indeed showed an 
activity that correlated significantly with the degree of imitation during the perception step. 

Keywords: phonetic convergence, imitation, speech production, speech perception, sensory-motor interactions, 
internal models 
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INTRODUCTION 

When they interact, speakers tend to imitate their interlocu- 
tor's posture (Shockley et al., 2003), gestures, facial expressions, 
and breathing (Chartrand and Bargh, 1999; Estow and Jamieson, 
2007; Sato and Yoshikawa, 2007). Regarding their interlocutor's 
speech, such convergence effects also occur at the phonetic, lexi- 
cal, and syntactic levels (Natale, 1975; Pardo, 2006; Delvaux and 
Soquet, 2007; Kappes et al, 2009; Aubanel and Nguyen, 2010; 
Bailly and Lelong, 2010; Miller et al., 2010; Babel, 2012; Babel and 
Bulatov, 2012). The phenomenon of "phonetic convergence," also 
referred to as "accommodation," "entrainment," "alignment," or 
"chameleon effect," not only concerns supra-segmental parame- 
ters such as vocal intensity (Natale, 1975), fundamental frequency 
(fo) (Gregory et al, 1993, 2000; Bosshardt et al, 1997; Goldinger, 
1997; Babel and Bulatov, 2012) and long-term average spectrum 
(Gregory et al, 1993, 1997, 2000; Gregory and Webster, 1996) but 
also temporal and spectral cues to phonemes like voice onset time 
(VOT) of stop consonants (Sancier and Fowler, 1997; Shockley 
et al., 2004; Nielsen, 2011) and the first two formants of vowels 
(Fi, Fr, (Babel and Bulatov, 2012; Pardo, 2010; Sato et al, 2013). 
This phenomenon appears to be quite subtle, /o and speech rate 



showing the greatest sensitivity to phonetic convergence (Pardo, 
2010; Sato et al, 2013). 

Most of the literature considers this convergence phenomenon 
as primarily driven by social or communicative motivations. 
Convergence behaviors may aim at placing the interaction on a 
"common ground" of sounds and gestures, which is hypothe- 
sized to improve communication at the social level and/or at the 
intelligibility level. 

Several theories predict that speakers converge more toward 
people they like, and from whom they want to be liked in 
return (Byrne, 1997; Chartrand and Bargh, 1999; Babel, 2009), 
toward people they are acquainted with (Lelong and Bailly, 2012) 
or toward people who exert a leadership (Pardo, 2006) or any 
kind of social dominance/hierarchy on them (Gregory and Hoyt, 
1982; Street and Giles, 1982; Gregory, 1986; Giles et al., 1991; 
Gregory and Webster, 1996). More generally, the Communication 
Accommodation Theory (CAT; Giles et al., 1991) considers pho- 
netic convergence and divergence as a social tool to mark the 
desire to belong to or to distinguish oneself from a social group 
(Giles et al., 1973; Giles, 1973; Bourhis and Giles, 1977; Tajfel 
and Turner, 1979; Giles et al, 1991). Work by Krashen (1981) 
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and Pardo (2006) supports the idea that phonetic convergence is 
driven by empathy, rather than by the desire to be Hked. In any 
case, no evidence has been provided yet, supporting the idea that 
we like people more if they converge toward us [although, on the 
other side, previous studies showed that we like people more after 
imitating them (Adank et al., 2013)]. 

Phonetic convergence is also believed to improve communi- 
cation at the intelligibility level. Producing speech sounds and 
lexical forms that are more similar to the own repertoire of the 
interlocutor may facilitate phonetic decoding and lexical access. 
However, no study has shown such intelligibility benefits yet 
[although, on the other side, previous studies showed it is eas- 
ier to understand an accent after imitating it (Adank et al, 
2010)]. In fact, this idea appears contradicted by the fact that 
our own speech — that cannot be more similar to our own sound 
repertoire — is not more intelligible to us than speech produced 
by others (Hawks, 1985). 

Several additional observations lead us to partly reconsider 
the idea that phonetic convergence may be primarily driven 
by social and communicative motivations. First, phonetic con- 
vergence was also observed in non-interactive tasks of speech 
production (Goldinger, 1998; Namy et al, 2002; Shockley et al, 
2004; Vallabha and TuUer, 2004), even at the basic level of vowel 
production (Sato et al., 2013). Partial imitation of lip gestures and 
vocal sounds was also observed in newborns and small children 
(Heimann et al, 1989; Meltzoff and Moore, 1997). Such imitation 
processes appear to be involuntary (Garrod and Clark, 1993) and 
are believed to play a key role in cognitive development, in par- 
ticular for language acquisition (Chen et al, 2004; Serkhane et al., 
2005; Nagy, 2006). Some authors support the idea that these auto- 
matic imitation processes stiU exist in adults, but that they may be 
neutralized by inhibition processes. They formulate this hypoth- 
esis from the observation of patients with fontal brain lesions and 
a loss of social inhibition, who systematically repeat and imitate 
their interlocutor (Brass et al, 2003, 2005; Spengler et al, 2010). 
These studies suggest that imitation would be innate and involun- 
tary while inhibiting imitation, and controlling the degree of this 
inhibition, is what we may learn with age. 

Rizzolatti and colleagues (e.g., lacoboni et al, 1999; Rizzolatti 
et al., 2001; Gallese, 2003) have suggested the idea of a "direct 
matching" between perception and action, as the basis for imita- 
tion of motor tasks. Main empirical support of this theoretical 
proposal comes from the discovery of mirror neurons in the 
macaque brain (Rizzolatti et al., 1996, 2002). Mirror neurons are 
a small subset of neurons, found in the macaque ventral pre- 
motor cortex and the anterior inferior parietal lobule that fire 
both during the production of goal-directed actions and during 
the observation of a similar action made by another individ- 
ual. In humans, homologous brain regions were also found to 
be involved in both action perception and production (notably, 
the pars opercularis of Broca's area, located in the posterior part 
of the inferior frontal gyrus; Rizzolatti and Arbib, 1998). Such 
a "motor resonance" was observed not only for finger, hand 
and arm movements (Tanaka and Inui, 2002; Buccino et al., 
2004; Molnar-Szakacs et al., 2005), but also for mouth and lip 
movements (Fadiga et al., 2002; Wilson, 2004; Skipper et al., 2007; 
D'Ausilio et al., 2011). This overlapping network appears to be 



hard wired, or at least to function from the very beginning of life 
(Sommerville et al, 2005; Nystrom, 2008). 

Regarding speech, a number of models also support the idea 
of a direct matching between perception and motor systems 
(for reviews, see Galantucci et al., 2006; Schwartz et al., 2012). 
Motor theories of speech perception argue that speech is primar- 
ily perceived as articulatory gestures (Liberman and Mattingly, 
1985; Fowler, 1986) and sensory-motor theories postulate that 
phonetic coding/decoding and representations are shared by 
speech production and perception systems (Skipper et al, 2007; 
Rauschecker and Scott, 2009; Schwartz et al., 2012). Brain imag- 
ing studies provide evidence for an involvement of the motor 
system in speech perception (Fadiga et al., 2002; Wilson, 2004; 
Skipper et al., 2007). Anatomical connections between poste- 
rior superior temporal regions, the inferior parietal lobule, and 
the posterior ventrolateral frontal lobe in the premotor cortex 
were attested using diffusion tensor magnetic resonance imag- 
ing (Catani and Jones, 2005). Recent neurobiological models of 
speech perception and production postulate the existence of a 
dorsal sensory-motor stream (Hickok and Poeppel, 2000, 2004, 
2007; Poeppel and Hickok, 2004) mapping acoustic representa- 
tions onto articulatory representations, including the posterior 
inferior frontal gyrus, the premotor cortex, the posterior superior 
temporal gyrus/sulcus, and the inferior parietal lobule (Callan 
et al., 2004; Guenther, 2006; Skipper et al, 2007; Dick et al., 2010). 

To sum up, all these observations and models support the idea 
that humans have shared representations of the motor commands 
of an action and of its sensory consequences. This functional 
coupling between perception and action systems, through these 
shared representations, argues for perception not only consist- 
ing in information decoding but also contributing to the auto- 
matic and involuntary "update" or "recalibration" of these shared 
sensory-motor representations. 

This brings us to reconsider the mechanisms underlying the 
phenomenon of phonetic convergence and to explore the hypoth- 
esis that automatic and involuntary imitation of phonetic features 
might reflect a sensory-motor learning, taking place as soon 
as speech is perceived. In favor of this hypothesis is the fact 
that speakers modify their way of speaking not only during the 
interaction with their interlocutor, but also after the interaction. 
This "after-effect" concerns not only speech production, but also 
speech perception: vowel categorization was found to be modi- 
fied after repeated exposure to someone else's speech (Sato et al., 
2013). Furthermore, passive listening, without any motor involve- 
ment, appears to be sufficient to observe these after-effects (Sato 
et al., 2013). 

The present study aimed at determining the neural substrates 
of phonetic convergence and more particularly at: (1) under- 
standing whether phonetic convergence and deliberate imitation 
of speech are underpinned by the same neurocognitive mecha- 
nisms, (2) examining to what extent sensory-motor brain areas 
are involved during deliberate and unconscious imitations of 
speech, and (3) better understanding the degree of control and 
consciousness that one can have on imitation and its inhibition. 

On the basis of previous studies, showing the involvement of 
the dorsal stream in voluntary imitation of speech (Damasio and 
Damasio, 1980; Caramazza et al., 1981; Bartha and Benke, 2003; 
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Molenberghs et al., 2009; Irwin et al, 2011; Reiterer et al, 2011) 
and fast overt repetition (Peschke et al, 2009), we assumed the 
dorsal stream to be also involved in phonetic convergence. We 
expected deliberate imitation and unconscious convergence to be 
based on the same mechanisms but to rely on the modulated acti- 
vation of the dorsal stream, particularly during the perception 
step of the perception-action loop. Finally, we also hypothesized 
that phonetic convergence can be inhibited to some extent, and 
that this inhibition also relies on activity changes of the dorsal 
stream. 

To explore these hypotheses, we simultaneously recorded 
speech signals and neural responses of 16 participants, in three 
tasks of speech imitation with varying degrees of will and con- 
sciousness: voluntary imitation, phonetic convergence, intended 
inhibition of phonetic convergence. In these tasks, we focused on 
one phonetic feature particularly sensitive to that phenomenon: 
/o, which was varied specifically for each participant, from —20 
to -1-20% around his/her own average pitch. We used a go-nogo 
paradigm in order to compare brain activations during the whole 
imitative process or during its perception and production steps 
only. In addition, two other speech control tasks (passive per- 
ception and production) were included in order to compare 
brain activations during perception and motor steps of the imi- 
tative process, with brain activations during usual perception and 
non-imitative production of vowels. 

METHODS 
PARTICIPANTS 

Sixteen right-handed and healthy participants ( 1 1 males and 4 
females of 27 ± 5 years old), French native speakers, volunteered 
to participate in the experiment. None of them had any speak- 
ing or hearing disorders. None of them had previously received 
explicit information about phonetic convergence phenomena. 
The study received the ethic approval from the Centre Hospitaller 
Universitaire de Grenoble, from the Comite de Protection des 
Personnes pour la Recherche Biomedicale de Grenoble and from 
the Agence Fran^aise de Securite Sanitaire des Produits. 

PROCEDURE 

The experiment consisted of three tasks of interest and two 
reference tasks. 

Tl. Reference task: passive auditory perception of vowels. 

T2. Vowel production task. The vowels to be produced were 
played to the participant through headphones. Participants 
were expected to partly and unconsciously imitate these 
stimuli (convergence effect). 



T3. Vowel production reference task. The vowels to be pro- 
duced were displayed on a screen viewed by the participant. 
Participants were expected to produce vowels according to 
their own speech representations. 

T4. Vowel imitation task. Like in T2, vowels were played to the 
participant through headphones. Participants were asked to 
produce these vowels and to ^ imitate the voice heard 

T5. Vowel production and convergence inhibition task. 
Participants were briefly informed about the existence 
of convergence phenomena. Like in T2 and T3, vowels 
were played to the participant through headphones. They 
were asked to produce these vowels as close as they could 
from their habitual production, trying not to follow the 
stimuli. 

Participants were simply informed that the experiment would 
consist in the production and perception of vowels. The two first 
tasks were presented as such to the participants, in order for them 
not to suspect the audio stimuli to influence their own produc- 
tion. The voluntary imitation and inhibition tasks were thus left 
for the end of the experiment. These five tasks were followed by 
a brain anatomical scan. The whole procedure was completed in 
one and an half hour. 

The audio stimuli used in the conditions Tl, T2, T4, and T5 
consisted of 27 different vowels, specifically selected for each par- 
ticipant. First, a vowel database with modified pitches was created 
from 3 French vowels ( [e] , [oe] , [o] ) produced by a reference male 
speaker and a reference female speaker. Pitches were artificially 
shifted by steps of 5 Hz from 80 to 180 Hz for the male vowels, 
and from 150 to 350 Hz for the female vowels. This pitch manip- 
ulation was performed using the PSOLA module integrated in 
Praat, which enables to modify pitch without affecting formants 
or speech rate. Before the experiment, each participant was also 
recorded while producing a series of vowels, in order to determine 
his/her habitual pitch (see Table 1). Finally, for each participant, 
27 stimuli were selected from the vowel database, with the 9 quan- 
tified frequencies closest to 80, 85, 90, 95, 100, 105, 110, 115, 
and 120% of his/her habitual pitch. The visual stimuli used in 
the condition T3 consisted of the 3 symbols ^ee^, <g;eu3>, and 
«oo». 

The two reference tasks (Tl and T3) consisted in 54 trials. 
The 27 audio or visual stimuli described above were presented 
in a pseudo-random order and in alternation with 27 void ^ 
stimuli (i.e., no sound in Tl or no displayed vowel in T3). These 
<^ void ^ stimuli were used as a baseline for the comparison 
of neural activations. Each trial lasted 10 s. Stimuli were played 



Table 1 | Participants' information. 
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(Tl) or displayed (T3) during the first 500 ms. One second later, a 
fixation cross was displayed during 500 ms which indicated when 
the participant had to produce the vowel for the T3 condition. 

The three tasks of interest (T2, T4, and T5) consisted of 81 tri- 
als. In these tasks, the 27 audio stimuli were presented twice, in a 
pseudo-random order and in alternation with the 27 ^ void ^ 
stimuli (i.e., no sound). Concretely, one third of the time, audio 
stimuli were followed by a green cross, indicating to the partici- 
pant that he/she should produce the vowel (<§; Go One other 
third of the time, audio stimuli were followed by a red cross, 
meaning that the participant should remain quiet (<^C No Go ^). 
The last third of the time, no stimulus was played and a red 
cross was displayed (Baseline). This go/no-go paradigm enables 
to compare the neural activations in a double task of speech 
production and perception, with those in a task of <^ active ^ 
perception, i.e., when participants perceive vowels with the goal 
of producing them afterwards, but finally without carrying out 
any motor action. 

MATERIAL AND DATA ACQUISITION 

Visual instructions were displayed on a screen located behind the 
participant, using a video projector and the Presentation software 
(Neurobehavioral Systems, Albany, EU). Participants could read 
them by reflection, thanks to a mirror placed above their eyes. 
Audio stimuli were played though MRI-compatible headphones. 
The audio level was set to a sufficient intensity so that participants 
could hear the stimuli correctly, despite the earplugs they wore to 
protect them from the scanner noise. The production of vowels 
was recorded thanks to a microphone placed 1 m away from their 
mouth. 

Anatomic and functional images were acquired with a whole 
body 3T scanner (Bruker MedSpec S300) equipped with a trans- 
mit/receive quadrature volume head coil. The fMRI experi- 
ment consisted of five functional runs and one anatomical run. 
Functional images were obtained using a T2* -weighted, echopla- 
nar imaging (EPI) sequence with whole-brain coverage (TR = 
10 s, acquisition time = 2600 ms, TE = 30 ms, flip angle = 90°). 
Each functional scan comprised forty axial slices parallel to the 
anteroposterior commissural plane acquired in interleaved order 
(72 X 72 matrix; field of view: 216 x 216 mm^; 3 x 3mm^ in 
plane resolution with a slice thickness of 3 mm without gap). 
A <SC sparse sampling ^ acquisition paradigm was used in 
order to minimize potential artifacts articulatory movements 
could induce on functional images. This acquisition technique 
stems from the time delay existing between the neural activity 
linked to a motor or perceptual task and the associated hemo- 
dynamic response. Based on the estimation of this delay in 
tasks of speech production and perception by previous studies 
(Grabski et al., 2013), the functional scan was chosen to start 4.7 s 
after stimulus perception, thus 3.7 s after vowel production (in 
T2-T5). A high-resolution Tl-weighted whole-brain structural 
image was acquired for each participant after the third func- 
tional run (MP-RAGE, sagittal volume of 256 x 224 x 176 mm^ 
with a 1 mm isotropic resolution, inversion time = 900 ms, two 
segments, segment repetition time = 2500 ms, segment dura- 
tion = 1795 ms, TR/TE = 16/5 in ms with 35% partial echo, flip 
angle = 8°). 



ACOUSTIC ANALYSIS 

The acoustic analyses were performed using Praat software. A 
semi-automatic procedure was used to segment vowels on the 
basis of intensity and duration criteria. Hesitations and mis- 
pronunciations were removed from the analyses, /o values were 
calculated, using an autocorrelation method, from a time inter- 
val defined as ±25 ms of the maximum peak intensity of the 
sound file. 

The stimuli were specific to each participant, with /o values 
varying between approximately —20 and -|-20% of the partic- 
ipant's average habitual /o (see Table 1). Consequently, the fy 
values of the produced vowels were also converted to the partici- 
pant's range, expressed as the percentage of deviation from his/her 
average habitual /o : A/o = (/oproduced -/ohabitual)//ohabitual- 

fMRI DATA PREPROCESSING AND STATISTICAL ANALYSIS 

Data were analyzed with the software SPM5 (Statistical 
Parametric Mapping, Wellcome Trust Centre for Neuroimaging, 
London, UK). The fMRI data of one participant (S3) were arti- 
facted by a metalic pin and could therefore not be included in 
the analysis. The results reported in the fMRI data section of this 
article thus concern the 15 remaining participants. 

For each participant, functional images were realigned, nor- 
malized in the reference space of the Montreal Neurological 
Institute (MNI) and smoothed with a 6 mm width Gaussian 
low-pass filter. 

The hemodynamic responses corresponding to the experimen- 
tal conditions were then estimated with a general linear model, 
including the characterization of a unique impulse response for 
each functional scan and taking body movements into account 
through regressors of non-interest. 

Whole brain statistical analysis 

Eight T-contrasts were tested (see Table 2), in order to 
identify brain regions specifically involved in vowel percep- 
tion or production (when listening passively or actively, and 
with different degrees of imitation), compared to a resting 
condition. 



Table 2 | Detail of the eight individual t-contrasts tested, and of the 
corresponding cognitive tasks explored. 

T-Contrasts Cognitive tasks 

1 Tl : Perception - Rest Passive perception 

2 T2: Task NoGo- Rest -Production 
Q (unconscious 
H- imitation) 

u 3 T4: Task NoGo - Rest Perception in -Deliberate 
m preparation of imitation 

Q_ 





4 


T5 


Task NoGo - Rest 


-Inhibited 
imitation 


z 
o 


5 


T3 


Production - Rest 


Production from visual instructions 


1- 
o 


6 


T2 


Task Go - NoGo 


Production (unconscious imitation) 


Z) 
Q 


7 


T4 


Task Go - NoGo 


Deliberate imitation 


PRO 


8 


T5 


Task Go - NoGo 


Inhibited imitation 



Frontiers in Psychology | Cognitive Science 



September 2013 | Volume 4 | Article 600 | 4 



Garnier et al. 



Neural correlates of speech imitation 



Using SPM, a flexible factorial group analysis was con- 
ducted from these individual contrasts, corresponding to a 
One-Way repeated measures ANOVA (one factor TASK with 8 
levels). 

Eight T-contrasts were tested in order to identify brain regions 
specifically involved in each task of vowel perception and/or 
production, compared to a resting condition. Two conjunctions 
were calculated from the first four contrasts examining neural 
correlates of vowel perception, as well as from the four follow- 
ing contrasts examining neural correlates of vowel production. 
Two F-contrasts tested the main effect between the vowel per- 
ception conditions (1,2,3,4) and between the vowel production 
conditions (5,6,7,8). 

For these contrasts, statistical significance was considered for 
p < 0.05, corrected for multiple comparisons (^false discovery 
rate ^ test for perception tasks and <^ family-wise error ^ 
test for production tasks), with activation clusters wider than 25 
voxels. 

The 3D coordinates of the center of gravity of the activated 
clusters, normalized in the MNI reference space were assigned to 
functional areas of the brain thanks to the SPM Anatomy tool- 
box and on the basis of cytoarchitechtonic probabilities. When 
not assigned in the SPM Anatomy toolbox, brain regions were 
labeled using Talairach Daemon (Lancaster et al., 2000). 

Regions of interest analysis 

This study hypothesizes that the dorsal stream would be involved 
in speech imitation and phonetic convergence. Particular atten- 
tion was therefore paid to neural activations in regions of the dor- 
sal stream. With the SPM Anatomy toolbox, 7 ROIs were defined 
in both hemispheres, from the cytoarchitechtonic probability of 

- Region TE (including TEl.O, TEl.l, and TE1.2) 

- Region TE3 (Wernicke's area, including the Spt area), 

- Supramarginal Gyrus (IPC PF, PFm, PFcm) 

- Region BA6 (premotor cortex and supplementary motor area) 

- Regions BA44 and BA45 (Broca's area) 

- The Insula 

Using Marsbar, eight T-contrasts (similar to Table 2) were tested 
from individual fRMI data, in order to determine the difference of 
neural activations in the ROIs previously defined, between tasks 
of vowel perception and/or production (when listening passively 
or actively, and with different degrees of imitation), and a resting 
condition. 

Using SPSS software, a One-Way repeated measures ANOVA 
was then conducted on these individual differences of neural 
activation observed in each ROI. Statistical significance was con- 
siderered for p < 0.001, post-hoc analyses being corrected for 
multiple comparisons (Bonferroni). 

Finally, we performed a Pearson correlation analysis to 
determine the correlation between the average activation of 
each ROI, for each participant, in the deliberate and uncon- 
scious imitation tasks, and their demonstrated degree of 
imitation (defined from the behavioral data, as the slope coef- 
ficient between their produced /o, and that of the followed 
stimuli). 



RESULTS 

BEHAVIORAL RESULTS 

Figure 1 summarizes the average behaviors observed in the delib- 
erate imitation (T3), unconscious imitation (T2), and inhibited 
imitation (T5) tasks. On average, the observed tendencies con- 
firmed our expectations: 

- participants were able to imitate almost perfectly the pitch of 
the audio stimuli (T3; slope coefficient of 0.87, r = 0.900, p < 
0.001). 

- participants unconsciously followed the pitch of the audio 
stimuli in the production task when vowels were presented 
auditorily (T2; r = 0.635, p < 0.001). This unconscious imita- 
tion was, however, not as strong as voluntary imitation (Slope 
coefficient of 0.57). It is worthwhile noting that this order of 
magnitude is much higher than the convergence effects usually 
reported in behavioral studies (slope coefficient of 0.08 in Sato 
et al., 2013, for instance). 

- participants were able to inhibit almost completely this con- 
vergence effect when informed about its existence (T5; Slope 
coefficient of 0.08, r = 0.067, p = 0.17). 

At the individual level, however, varying behaviors were observed. 
Figure 2 gives an overview of these different individual behaviors. 
Table 3 summarizes the results of the Pearson correlation analysis. 

Some participants demonstrated better imitation abilities than 
others but all of them were able to follow variations of pitch (slope 
coefficients from 0.62 to 1.00). 

Five speakers (see bottom panel of Figure 2) did not show any 
significant behavioral difference in the variation of /o between 
deliberate and unconscious imitations: they completely followed 
the pitch of the audio stimuli, even in task T3 for which they 
were not told of or conscious about convergence effects (slope 
coefficients from 0.78 to 0.98). 

Eight speakers (see top panel of Figure 2) showed a signifi- 
cantly weaker degree of imitation in the unconscious imitation 
task. The slope of the convergence effect showed a great inter- 
speaker variability, from 0.11 to 0.69. 



UNCONSCIOUS 
IMITATION 




= 0.63, p<.001 
Slope: 0.57 



INHIBITED 
IMITATION 



• fife I «•..•< • 



r = 0.07, p=.17 
Slope: 0.08 



-KI-1S-lb-5 0 5 10 15 20 -M-l's-l'o -5 6 5 10 1520 
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-20-15-10 -S 0 5 10 15 20 



FIGURE 1 I Average behaviors observed in thie three tasks of 
deliberate, unconscious, and inhibited imitation. Vowel stimuli were 
presented with 9 fo values, varying around the habitual average pitch of 
each participant. The y-axis represents how participants modified their 
produced fo from their habitual average fo (see Table 1). 
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FIGURE 2 1 Individual behaviors observed in tlie three taslcs of 
deliberate, unconscious, and inhibited imitation. Vowel stimuli 
were presented with 9 fg values, varying around the habitual 
average pitch of each participant. The y axis represents how 



participants modified their produced fo from their habitual average 
ffl. The six participants of the bottom panel did not show 
significant difference in their behavior between the tasks of 
deliberate and unconscious imitation. 



Great inter-speaker variability was observed in the inhibition 
task too. Ten out of 16 speakers (S3, S4, S5, S6, S7, S9, Sll, S12, 
S14, S16) did not show a significant correlation between their pro- 
duced /o and that of the stimuli in this task, which supports the 
idea that they were able to inhibit the convergence effect. 

Three speakers (SI, S2, S13) showed a significant and posi- 
tive correlation between their produced /o and that of the stimuli, 
with a significantly weaker linear regression slope than in the 
unconscious imitation task (respectively 0.15, 0.16, and 0.29). 



These speakers were thus able to partially compensate for the 
convergence effect. 

Two speakers (S8 and SIO) also showed a significant correla- 
tion between their produced fy and that of the stimuli, but with 
a linear regression slope significantly almost as high (respectively 
0.51 and 0.65) as in the unconscious imitation task. Inhibiting the 
convergence effect was therefore very hard for these participants. 

Finally, one of the speakers (S15) even showed a significant 
but negative correlation between her produced /o and that of 
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Table 3 | Individual results of Pearson's correlation test, examining the degree of imitation In the tasks of deliberate, unconscious, and 
inhibited imitation. 







f)plihpi'fi'tp imitfi't'inn 

^ wl 1 Uwl CI Lw IIIIILaLIUII 






Unconscious imitation 




InhihitpH imitntinn 

IIIIIIUILwU IIIIILCILIUII 




Slope 


r 


P 


Slope 




P 


Slope 


r 


P 


SI 


0.90 


0.96 


<0.001 


0.88 


0.98 


<0.001 


0.15 


0.52 


=0.005 


S2 


0.98 


0.98 


<0.001 


0.29 


0.59 


=0.001 


0.16 


0.48 


=0.012 


S3 


1.00 


0.98 


<0.001 


0.83 


0.77 


<0.001 


0.10 


0.22 


=0.26 


S4 


0.87 


0.96 


<0.001 


0.78 


0.97 


<0.001 


-0.09 


-0.21 


=0.285 


S5 


0.87 


0.95 


<0.001 


0.19 


0.36 


=0.005 


-0.11 


-036 


=0.069 


S6 


0.62 


0.93 


<0.001 


0.11 


0.38 


=0.005 


0.03 


0.19 


=0.336 


S7 


0.65 


0.89 


<0.001 


0.27 


0.75 


<0.001 


0.06 


0.32 


=0.099 


S8 


0.93 


0.96 


<0.001 


0.69 


0.86 


<0.001 


0.51 


0.70 


<0.001 


S9 


0.89 


0.99 


<0.001 


0.57 


0.88 


<0.001 


0.22 


0.31 


=0.112 


S10 


0.98 


0.99 


<0.001 


0.98 


0.99 


<0.001 


0.65 


0.85 


<0.001 


S11 


0.78 


0.93 


<0.001 


0.45 


0.74 


<0.001 


0.01 


0.02 


=0.952 


S12 


0.94 


0.96 


<0.001 


0.89 


0.94 


<0.001 


-0.86 


-0.40 


=0.039 


S13 


0.87 


0.99 


<0.001 


0.88 


0.99 


<0.001 


0.29 


0.61 


<0.001 


S14 


0.78 


0.90 


<0.001 


0.39 


0.80 


<0.001 


0.10 


0.35 


=0.078 


S15 


1.00 


0.96 


<0.001 


0.44 


0.77 


<0.001 


-0.20 


-0.50 


=0.009 


816 


0.75 


0.93 


<0.001 


0.46 


0.66 


<0.001 


0.18 


0.35 


=0.076 



Results that are statistically significant (p < 0. 05) are indicated in blacl< and non significant results are Indicated in grey. 



the stimuli (slope coefficient of -0.20, r = -0.500, p = 0.009), 
indicating a strategy of overcompensation of the convergence 
effect. 

NEURAL ACTIVATIONS 

Systems of speech perception and production 

The classical neural networks for speech perception and produc- 
tion were observed in the reference tasks of passive vowel per- 
ception and vowel production from visual instructions. Surface 
rendering of brain activity observed in these reference tasks is 
displayed in the top left panels of (Figures 3, 4). 

Vowel perception and production reference taslis. Passive vowel 
perception induced large bilateral activation of the superior tem- 
poral gyrus (STG), from its anterior part to the temporo-parietal 
junction. Maximum activity was displayed in the primary, sec- 
ondary, and associative auditory cortices, extending to the middle 
temporal gyrus (MTG), the Insula, and the rolandic operculum. 
Bilateral activations were also observed in the inferior frontal 
gyrus (IFG), within the pars opercularis and triangularis, extend- 
ing ventrally to the pars orbitalis in the left hemisphere, and 
rostrodorsally to BA46 in the right hemisphere. Additional frontal 
activations were observed bUateraUy in the dorsolateral prefrontal 
cortex, the premotor cortex, and the supplementary motor area 
(SMA). Superior and inferior parietal activations were observed 
bilaterally in the supramarginal gyrus (SMG), the rolandic oper- 
culum, and in the left angular gyrus. Further activity was dis- 
played in limbic structures, in particular in the thalamus and the 
cingulate cortex (anterior part in the left hemisphere and mid- 
dle part in the right hemisphere), and in the basal ganglia (right 
caudate nucleus). 

Vowel production from visual instructions induced bilateral 
activations of the premotor, primary motor, and sensorimotor 



cortices, and of the SMA. Bilateral activations were also observed 
in the IFG (pars opercularis and triangularis) and in the STG, 
extending to the rolandic operculum and the SMG. Additional 
activations were displayed bilaterally in superior and posterior 
parts of the parietal cortex, including the precuneus, the associa- 
tive cortex, and the angular gyrus. Further activity was found in 
the left inferior temporal gyrus, and bilaterally in the cerebellum, 
the cingulate cortex (anterior and middle part in the left hemi- 
sphere, middle part only in the right hemisphere), and the visual 
cortex. 

Speech perception and production with various types and 
degrees of imitation. The typical neural network for speech per- 
ception was found again in the three other tasks of active per- 
ception, in preparation of deliberate, unconscious, or inhibited 
imitations (NoGo trials). Surface rendering of the conjunction of 
the brain activity observed in all the perception tasks is displayed 
in the right panel of Figure 3, with a summary of the maximum 
activation peaks. 

This shared perception network involves bilateral activation 
of the STG, extending to the rolandic operculum and to the 
left Insula. Frontal regions participate in this network in the 
left hemisphere only, in particular Broca's area (pars opercu- 
laris and triangularis of the IFG), and the frontal region BA8. 
It also involves inferior parietal regions in both hemispheres: 
the SMG, extending to the rolandic operculum on the left 
side, and the angular gyrus on the right side. Further shared 
activations were found in the limbic system (right thalamus 
and left posterior cingulate cortex). A significant activation of 
the dorsolateral prefrontal region BA46 was also observed dur- 
ing the perception step of deliberate and inhibited imitations 
(NoGo trials, see Figure 1). However, the activity of this region 
was not found to vary significantly between the 4 different 
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Passive perception 




Perception with preparation of production, with involuntary imitation 

13 




Perception with preparation of deliberate Imitation 




Perception with preparation of production, with inhibited imitation 




Perception system (Conjunction) 




Region 


Coordinates 
X y X 


T-value 









Supramarginal gyrus (IPC PFcm) 


BA40 


-4S 


•39 


24 


7.27 




Parietal 






■51 


•39 


30 


7.20 






(IPC PF) 


BA40 


-S4 


•45 


51 


3.91 


Ul 

tc 




Rolandic operculum (0P4) 


BA43 


-57 


•9 


12 


3.35 


UJ 

Z 

a 


Temporal 


Superior temporal gyrus 




-39 


•39 


15 


5.94 


Wl 




BA22 


-53 


■9 


0 


4.70 


S 

lu 




Inferior frontal gyrus 


BA45 


-51 


15 


0 


4.49 


X 


Frontal 




BA44 


•45 


15 


30 


4.30 






Ul 






BA47 


-30 


21 


•9 


3.42 






Medial frontal gyrus 


BA8 


0 


21 


48 


3.52 




















Limbic 
















system 


Posterior cingutate gyrus 


BA23 


0 


-30 


24 


4.61 








Superior temporal gyrus 


8A22 


54 


-30 


3 


10.73 




Temporal 




BA21 


60 


•27 


3 


7.70 








8A38 


36 


3 


•15 


3.50 


RIG 


Parietal 


Angular gyrus (IPC PGa) 


8A40 


60 


•54 


30 


3.48 




Limbic 


Thalamus - Media) Dorsal 






•18 




3.33 




system 


Nucleus 




9 


9 



FIGURE 3 I Surface rendering of brain regions activated in the perception tasks and maximum activation peak summary for their conjunction. All 

contrasts are computed from the random-effect group analysis (p < 0.05, FDR corrected, cluster extent threshold of 25 voxels, coordinates in MNI space). 



speech perception tasks (see paragraph Whole Brain Analysis 
below). 

Similarly, the typical network for speech production was also 
observed in the three other speech production tasks with deliber- 
ate, unconscious, or inhibited imitations (t-contrast between the 
Go and the NoGo trials). Surface rendering of the conjunction of 
the brain activity observed in all the perception tasks is displayed 
in the right panel of Figure 4, with a summary of the maximum 
activation peaks. 

This shared production network involves bilateral activations 
in the premotor, primary motor and sensorimotor cortices, 
extending to the IFG (pars triangularis) and to the SMA. It also 
involves the primary auditory cortex in the STG, extending to 
the rolandic operculum, and to the Insula. Further shared acti- 
vations were found in posterior parietal regions, including the 
precuneus and the associative cortex, as well as in the limbic 
system (anterior cingulate gyrus, thalamus), the cerebellum, the 
putamen, the red nucleus, and the right basal ganglia (substantia 
nigra). 



Comparison of deliberate, unconscious and inhibited imitations 

Whole brain analysis. Using a corrected statistical analysis, no 
brain region was found to be significantly more or less activated 
between the four speech perception tasks. 

No brain region was found to be significantly modulated in 
activation between the four speech production tasks either. 

ROI analysis: differences between tasks. Tables 4, 5 summarize 
the results of further analysis and comparison of brain activity, 
more specifically in regions of interest of the dorsal stream. 

The ROI analysis showed a significant modulation of brain 
activity in the auditory cortex and Wernicke's area, bilaterally, 
between the four vowel perception tasks with varying degrees and 
types of imitation. No tendency was observed toward increasing 
or decreasing activation with the degree of imitation. 

For the production tasks, the ROI analysis again highlighted 
the right auditory cortex as a brain region of the dorsal stream 
whose activity is significantly modulated between the four vowel 
production tasks with varying degrees and types of imitation. 
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Production from visual instructions 




Production system (conjunction) 



Region 



Coordinates 



T-vatue 



Production (6o-No€o) wHh involuntary Imitation 




Production (Go NoGo) with deliberate imitation 









Middle frontal gyrus 




•45 


15 


36 


22.91 




Frontal 


Primary motor cortex (Ml-4p} 


BA4 


-21 


-30 


60 


10.44 




Premotor cortex 


BAe 


-48 


-9 


27 


19.06 






Inferior frontal gyrus 


BA47 


-33 


27 


0 


6.06 




Temporal 


Superior temporal gyrus 


BA-ll 


-39 


-33 


12 


19.78 


UI 

cc 




Cuneus 


BA19 


■12 


■81 


36 


9.30 


Ui 

X 




Striate cortex 


BA17 


-6 


-75 


12 


8.84 


EMISP 


Occipital 






-12 


-66 


6 


8.49 








-12 


■75 


6 


8.24 


z 

t 
a 




Extrastriate cortex 


BA19 


-24 


-66 


•15 


6.11 


Insular 


Claustrum 




-36 


3 


6 


7.95 




cortex 


Insula 


BA13 


-36 


21 


6 


6.14 






Superior parietal lobule (SCI) 


BA5 


-IS 


■33 


42 


6.85 




Parietal 


Superior parietal lobule (SM) 


BAS 


-9 


-39 


51 


S.62 






Cuneus 


BA7 


-18 


•81 


27 


6.13 




Limbic 


Anterior cingutate gyrus 


BA32 


-9 


15 


42 


7.93 




system 


Thalamus - Mammillary body 




-12 


•18 


0 


6.57 




Forebrain 


Putamen 




-21 


0 


3 


7.62 




Cerebellum 


Lobule VI (Hem) 




-IS 


•60 


•18 


7.49 




Midbrain 


Red nucleus 




•6 


•27 


•6 


7.42 






Temporal 


Superior temporal gyrus 


BAAl 


45 


-24 


12 


15.46 






Premotor cortex 


BAe 


42 


-12 


33 


18.34 








BAG 


57 


•6 


42 


13.42 




Frontal 


Supplementary motor area 


BAG 


6 


9 


57 


12.11 


ERE 




Primary motor cortex 


BA4 


21 


-27 


66 


10.91 


X 

a. 




Inferior frontal gyrus 


BA47 


39 


30 


0 


8.20 






Precuneus 


BA7 


12 


-78 


39 


8.77 


Parietal 


Superior parietal lobule (SCI) 


BAS 


12 


-36 


51 


6.67 


t- 








IS 


-24 


39 


5.51 


RIGh 


Occipital 


Striate cortex 
Cuneus 


BA17 
BA19 


21 
IS 


-51 
-84 


3 
33 


6.85 
6.52 




Insular 
cortex 


Insula 


BA13 


42 


6 


0 


S.73 




limbic 
system 


Anterior cingulate gyrus 

Thalamus 


BA32 


9 
6 
6 


15 
27 

-24 


36 
27 
0 


10.42 
553 

6.69 




Midbrain 


Red nucleus 




6 


•24 


•12 


7 88 




Basal 
ganglia 


Sustantia Nigra 




15 


•18 


•3 


6 89 




Forebrain 


Putamen 




24 


15 


-3 


6.74 






Lobule VI (Hem) 




12 


•66 


■12 


6.49 




Cerebelum 






18 


•60 


-21 


5.65 



FIGURE 4 I Surface rendering of brain regions activated in tiie production taslcs and maximum activation peal< summary for their conjunction. All 

contrasts are computed from the random-effect group analysis (p < 0.05, FWE corrected, cluster extent threshold of 25 voxels, coordinates in IVINI space). 



The left Insula and the right SMG were two additional regions of 
the dorsal stream that demonstrated a significant modulation of 
their neural activation. No tendency was found toward increas- 
ing or decreasing activation of these regions with the degree of 
imitation. 

ROI analysis: correlations with behavioral data. Table 6 sum- 
marizes the results of Pearson's correlation analysis that examined 
the correlation between brain activity in regions of interest of the 
dorsal stream and the degree of imitation in the deliberate and 
unconscious imitation tasks. The degree of imitation was eval- 
uated from the slope of the variation in the produced /o, as a 
function of the/o of the stimuli. 

Again, in the two active perception tasks, preparing a delib- 
erate or unconscious imitation, brain activity in both left and 



right auditory cortices was found to correlate significantly with 
the degree of following imitation. So did left Wernicke's area and 
the SMG, bilaterally. 

On the other hand, no brain region was found to vary in acti- 
vation with a significant correlation with the degree of imitation, 
for the production step of the deliberate and unconscious imi- 
tation only tasks (Go-NoGo) or for the entire process of these 
tasks (Go). 

DISCUSSION 

In line with previous studies on phonetic convergence, our 
results show that speakers follow and unconsciously imitate the 
phonetic features (here /o) of voices they are exposed to, even 
when unaware of this imitation phenomenon and without being 
instructed to do so. Our results, however, differ from previous 
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Table 4 | Results of the ROI analysis, comparing brain activity In several regions of the dorsal stream, between the four speech perception 
tasks [passive perception, perception in preparation of vowel production (phonetic convergence), of deliberate Imitation, or Inhibited 
Imitation]. 







Region 


IVIaIn effect 




Contrast estimates 










F 


P 


Passive 


Conv. 


Imlt. 


Inhlb. 




Temporal 


Auditory cortex (TE 1.0, TE 1.1, TE 1.2) 
Wernicke's area (TE3) 


4.672 
6.312 


0.007 
0.001 


0.102 
0.110 


0.052 
0.072 


0.045 
0.050 


0.038 
0.063 


1 — 


Parietal 


Supramarginal gyrus (IPC PF PFm, Pfcm) 


1.302 


0.286 


0.041 


0.026 


0.018 


0.032 


LEF" 


Frontal 


BA6 (Premotor cortex and SMA) 
Broca's area (BA44) 
Broca's area (BA45) 


0.121 
0.453 
1.696 


0.947 
0.717 
0.205 


0.009 
0.028 
0.023 


0.001 
0.018 
0.002 


0.002 
0.026 
0.005 


0.004 
0.027 
0.010 




Insular cortex 


Insula 


0.217 


0.884 


0.026 


0.019 


0.021 


0.017 




Temporal 


Auditory cortex (TE 1.0, TE 1.1, TE 1.2) 
Wernicke's area (TE3) 


4.643 
3.164 


0.007 
0.034 


0.095 
0.103 


0.063 
0.075 


0.051 
0.081 


0.038 
0.059 


1- 

X 


Parietal 


Supramarginal gyrus (IPC PF PFm, Pfcm) 


0.874 


0.462 


0.039 


0.029 


0.022 


0.031 


a 
cc 


Frontal 


BA6 (Premotor cortex and SMA) 
Broca's area (BA44) 
Broca's area (BA45) 


0.204 
0.174 
1.498 


0.893 
0.914 
0.229 


0.009 
0.029 
0.027 


-0.001 
0.023 
0.010 


0.001 
0.023 
0.003 


0 

0.021 
0.014 




Insular cortex 


Insula 


1.998 


0.179 


0.028 


0.013 


0.012 


0.011 


Results that are statistically significant (p < 0.05) are indicated in black and non significant results are indicated in grey. 








Table 5 | Results of the ROI analysis, comparing brain activity In several regions of the dorsal stream, between the four speech production 
tasks (production from visual Instructions, production with phonetic convergence, deliberate Imitation, Inhibited Imitation). 






Region 


IVIain effect 




Contrast estimates 










F 


P 


Prod. 


Conv. 


Imlt. 


Inhlb. 




Temporal 


Auditory cortex (TE 1.0, TE 1.1, TE 1.2) 
Wernicke's area (TE3) 


0.971 
0.846 


0.415 
0.477 


0.197 
0.102 


0.175 
0.082 


0.205 
0.105 


0.203 
0.097 


1 — 


Parietal 


Supramarginal gyrus (IPC PF PFm, Pfcm) 


0.925 


0.437 


0.076 


0.051 


0.066 


0.068 


LEF" 


Frontal 


BA6 (Premotor cortex and SMA) 
Broca's area (BA44) 
Broca's area (BA45) 


1.745 
1.792 
1.113 


0.173 
0.163 

0.355 


0.084 
0.086 
0.018 


0.064 
0.057 
0.009 


0.072 
0.077 
0.028 


0.096 
0.095 
0.033 




Insular cortex 


Insula 


3.678 


0.019 


0.100 


0.046 


0.080 


0.091 




Temporal 


Auditory cortex (TE 1.0, TE 1.1, TE 1.2) 
Wernicke's area (TE3) 


3.294 
1.439 


0.030 
0.245 


0.186 
0.109 


0.134 
0.087 


0.182 
0.096 


0.178 
0.115 


1- 

X 


Parietal 


Supramarginal gyrus (IPC PF, PFm, Pfcm) 


3.559 


0.022 


0.056 


0.014 


0.038 


0.037 


a 


Frontal 


BA6 (Premotor cortex and SMA) 
Broca's area (BA44) 
Broca's area (BA45) 


2.694 
2.780 
1.295 


0.058 
0.053 
0.289 


0.086 
0.113 
0.037 


0.063 
0.072 
0.021 


0.078 
0.089 
0.040 


0.103 
0.120 
0.058 




Insular cortex 


Insula 


2.614 


0.640 


0.093 


0.052 


0.086 


0.081 



Results that are statistically significant (p < 0.05) are indicated in black and non significant results are indicated in grey. Corresponding ROIs are indicated in bold. 



Studies by a much greater order of magnitude in the degree of 
unconscious imitation. Indeed, slope coefficients in unconscious 
imitation of/o ranged from 0. 1 1 to 0.99 in our study (0.57 on aver- 
age), whereas previous studies rather reported slope coefficients 
of 0.08 (Sato et al, 2013). Participants were debriefed after the 
experiment and they all said that they had not inferred that the 
study dealt with imitation, before accomplishing the deliberate 



imitation task (T4), and that they had neither thought nor tried 
to imitate the stimuli during the second vowel production task 
(T2). This discrepancy between our results and those of pre- 
vious studies cannot be interpreted in terms of interactive vs. 
non-interactive protocols, as Sato et al. (2013) also used non- 
interactive tasks of vowel perception and production, similar to 
this study. One possible explanation is that such a high degree 
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Table 6 | Results of Pearson's correlation analysis, examining the correlation between the degree of imitation during the tasks of deliberate 
and unconscious imitation, and the brain activity in several regions of the dorsal stream, during the whole imitative process (Go trials), during 
the perception step only (NoGo trials) and during the production step only (Go-Nogo contrast). 







Region 


Perception step 
(NoGo) 


Production step 
(Go-NoGo) 


Whole imitation 
process (Go) 








r 


P 


r 


P 


r 


P 




Temporal 


Auditory cortex (TE 1.0, TE 1.1, TE 1.2) 
Wernicke's area (TE3) 


-0.45 
-0.38 


0.013 
0.040 


-0.03 
0.04 


0.86 
0.82 


-0.29 
-0.20 


0.13 
0.28 


1— 


Parietal 


Supramarginal gyrus (IPC PF PFm, Pfcm) 


-0.38 


0.038 


0.10 


0.62 


-0.22 


0.24 


Ll_ 
LJJ 
_1 


Frontal 


BA6 (Premotor cortex and SMA) 
Broca's area (BA44) 
Broca's area (BA45) 


—0.28 
-0.01 
-0.25 


0.14 
0.98 
0.18 


—0.12 
-0.14 
0.04 


0.52 
0.47 
0.82 


—0.31 
-0.13 
-0.12 


0.10 
0.49 
0.53 




Insular cortex 


Insula 


-0.27 


0.14 


0.05 


0.78 


-0.12 


0.52 




Temporal 


Auditory cortex (TE 1.0, TE 1.1, TE 1.2) 
Wernicke's area (TE3) 


-0.37 
-0.20 


0.045 
0.30 


0.10 
-0.16 


0.59 
0.40 


-0.13 
-0.25 


0.49 
0.19 


1- 
I 


Parietal 


Supramarginal gyrus (IPC PF PFm, Pfcm) 


-0.46 


0.010 


0.09 


0.62 


-0.21 


0.26 


CD 

tr 


Frontal 


BA6 (Premotor cortex and SMA) 
Broca's area (BA44) 
Broca's area (BA45) 


-0.26 
-0.32 

-0.34 


0.18 
0.08 
0.06 


-0.09 
-0.12 
0.01 


0.64 
0.54 
0.94 


-0.25 
-0.27 
-0.21 


0.19 
0.16 
0.27 




Insular cortex 


Insula 


-0.02 


0.91 


0.03 


0.88 


-0.15 
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Results that are statistically significant (p < 0. 05) are indicated in blacl< and non significant results are Indicated in grey. 



of unconscious imitation may come from the vowel production 
task, which may be closer to singing than to speech. Another 
explanation is that we used participant-specific stimuli in this 
experiment, varying in /o from —20% to +20% of each par- 
ticipant's average habitual pitch. The pitch of these stimuli was 
consequently closer to the own pitch of each participant than in 
other studies on phonetic convergence, where the same stimuli are 
used for all the participants. One may suggest that phonetic con- 
vergence is not a linear phenomenon and that speakers are more 
disposed to completely mimic a voice already similar to their 
own voice, because the target is reachable and does not require 
much more effort to produce than their usual intra-individual 
variations. On the contrary, speakers may demonstrate a lesser 
degree of unconscious imitation toward voices that are very dif- 
ferent from their own, because the target is out of reach or would 
induce vocal discomfort. Supporting this idea, one participant of 
this experiment (S5) was found to follow completely the pitch 
of the stimuli for values below 5% of his average habitual pitch 
(slope of 1.0). Above that pitch height, he did not follow the stim- 
uli further and "saturated" to a constant /o value (see Figure 2). 
Another argument comes from studies of intra-individual varia- 
tions in habitual pitch, which is reported to vary as much as plus 
or minus three semitones, i.e., ~18% (Coleman and Markliam, 
1991). We can thus infer that the participants in our study have 
not made any particular effort to imitate the stimuli they were 
exposed to, which may explain why the degree of imitation is 
much greater than in the literature. 

As in previous studies, we also observed a great inter- 
individual variability in the degree of deliberate imitation, with 
slope coefficients observed ranging from 0.62 to 1.00. Such a 



result is consistent with Pfordresher and Brown (2007), show- 
ing that about 15% of the population faUs to imitate the pitch 
of a song by more than a semitone. Great inter-individual vari- 
ability was also observed in the degree of unconscious imitation. 
In particular, a group of six participants demonstrated a similar 
degree of imitation for deliberate and unconscious imitation 
(slope coefficients from 0.78 to 1.00) whereas the other par- 
ticipants showed slope coefficients ranging from 0.11 to 0.69. 
Babel and Bulatov (2012) also reported a substantial inter-speaker 
variability in /o accommodation, with some participants even 
diverging from their interlocutor. Although it has been suggested 
that female talkers may be better imitators (Pardo, 2006), we 
could not relate the different imitation abilities to the participant's 
gender or to their level of musical training. It is more likely, as sug- 
gested by Postma and Nilsenova (2012) or Lewandowski (2009) 
that inter-individual differences in the ability to imitate /o or a 
foreign accent ("phonetic talent") maybe related to the neurocog- 
nitive capacity to extract acoustic parameters (pitch, in particular) 
from the speech signal. Supporting this idea, in this study we 
found a significant correlation between the degree of imitation 
and brain activity in the auditory cortex, while the lateral Heschl 
gyrus is consensuaUy considered as the "pitch processing center" 
(Bendor and Wang, 2006). 

SENSORY-MOTOR INTERACTIONS IN SPEECH PERCEPTION, 
PRODUCTION, AND IMITATION 

At the neural level, the typical networks for speech production 
and perception were observed, in agreement with previous studies 
on vowel production and perception (Ozdemir et al, 2006; Soros 
et al., 2006; Terumitsu et al, 2006; Brown et al, 2008; Ghosh 
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et al., 2008; Grabski et al, 2013). Our results also concord with 
previous studies on voluntary imitation of speech (Damasio and 
Damasio, 1980; Caramazza et al., 1981; Bartha and Benke, 2003; 
Molenberghs et al, 2009; Irwin et al, 2011; Reiterer et al., 2011) 
and fast overt repetition (Peschke et al., 2009), about the involve- 
ment of brain regions of the dorsal stream in deliberate imitation 
processes. Contrary to the "direct matching hypothesis," however, 
we did not observe the whole dorsal stream, and the IFG in par- 
ticular, to be overall significantly more activated during deliberate 
imitation of speech (Irwin et al., 2011). In our results, only four 
ROIs of the dorsal stream: the auditory cortex and Wernicke's 
area, bilaterally, were found to vary significantly in activation 
from the passive perception task to the perception step of delib- 
erate imitation. Unexpectedly, greater activation was observed 
for the passive perception task. Nevertheless, several ROIs of the 
dorsal stream, including the right SMG, showed an activity that 
correlated negatively with the degree of imitation during the 
perception step of the imitative process. Peschke and colleagues 
(2009) also identified such a region in the right inferior parietal 
area, though with a positive correlation. 

The first questions addressed in this study were to determine 
whether phonetic convergence relied on the same mechanism 
and neural network as deliberate imitation, and to what extent 
brain regions related to sensori-motor integration were involved 
in that potentially shared network. Neither the whole brain anal- 
ysis nor the ROI analysis showed any significant modulation in 
brain activation between the two tasks of deliberate and uncon- 
scious imitations (see Tables 4, 5). Furthermore, the ROI anal- 
ysis revealed that the activation of several regions in the dorsal 
stream — the auditory cortex and the SMG, bilaterally, as well as 
the left Wernicke area — negatively correlated with the degree of 
imitation, during the perception step of the imitation processes. 
All these observations support the idea that both deliberate and 
unconscious imitations are based on the same mechanism and 
neural network, involving regions of the dorsal stream. Unlike 
Leslie et al. (2004) who compared deliberate and unconscious face 
imitation, we did not observe a right lateralization in unconscious 
speech imitation, and more bilateral activations for deliberate 
speech imitation, which would support the idea of a "voice mir- 
roring system" in the right hemisphere, as they suggested for face 
imitation. 

Another question concerned the steps of the perception- 
production process at which the imitative process occurs: is 
imitation included in the perception process, in the production 
one, or in both? The ROI analysis revealed that the activation 
in several regions of the dorsal stream was significantly modu- 
lated between vowel production and perception reference tasks, 
and both the perception and production steps of unconscious 
imitation — in the auditory cortex and Wernicke's area, bilaterally, 
for the perception step; in the right auditory cortex, the supram- 
aginal gyrus, and the left insula for the production step. In the case 
of deliberate imitation, however, significant changes in activation 
were also found in these ROIs, but only for the perception step, 
as compared to the vowel perception reference task. Finally, ROIs 
in the dorsal stream whose activation correlated with the degree 
of imitation were found for the perception step of imitative pro- 
cesses only. No such region was found for the production step. 



or for the whole imitative processes. These observations support 
the idea that ( 1 ) the imitation process requires both perception 
and production steps of the sensori-motor loop, and that (2) the 
degree of imitation is determined by processes occurring during 
the perception step. The fact that the degree of imitation is deter- 
mined by processes occurring during the perception step supports 
the hypothesis that perception intrinsically includes an automatic 
update of sensori-motor representations from the speech inputs. 

A last question dealt with the degree of control and conscious- 
ness that one can have on phonetic convergence and its inhibition. 
The behavioral results of this study showed that phonetic con- 
vergence can be inhibited to some extent. A great inter-speaker 
variability was observed: some speakers were able to inhibit this 
unconscious imitation completely (or even with an overcom- 
pensation), others only partially, while some speakers could not 
inhibit it at all. At the neural level, no additional region or net- 
work, out of the typical networks of speech production and 
perception, appeared to be specifically involved in imitation inhi- 
bition. It is worthwhile noting that a significant activation was 
observed during the perception step of deliberate and inhibited 
imitation in the dorsolateral prefrontal region BA46, an area com- 
monly associated with attention, resource allocation, and verbal 
self-monitoring (Indefrey and Levelt, 2004). That region is also 
known to have connections with temporal areas and to play a role 
in auditory processes (Romanski et al., 1999). However, the activ- 
ity of that region was not found to be significantly greater in delib- 
erate and inhibited imitation, as compared to passive perception 
or unconscious imitation, which does not enable us to speculate 
further on its role in imitation processes. On the contrary, mod- 
ulated activation was observed in the left insular cortex, a brain 
region involved, amongst other functions, in self-awareness and 
inter-personal experience. This is consistent with previous studies 
showing that resisting motor mimicry involves cortical areas that 
are required to distinguish between self-generated and externally 
triggered motor representations (Brass et al., 2003, 2005; Spengler 
et al, 2010). 

CONCLUSIONS AND PERSPECTIVES 

The different behavioral and neural observations of this study 
support the hypothesis that phonetic convergence may not only 
be driven by social or communicative motivations, but that it 
may primarily be the consequence of an automatic process of 
sensorimotor recalibration. This has some important implica- 
tions on speech production and perception, for the comprehen- 
sion of how internal models and phonetic representations are 
learnt and updated. Indeed, many previous studies had shown 
how speakers modify their speech production to compensate 
for perturbations of their auditory or proprioceptive feedback 
(Abbs and Gracco, 1984; Houde and lordan, 1998; Jones and 
Munhall, 2000; Villacorta et al., 2007; Shiller et al, 2009; Cai 
et al., 2010). After-effects of these compensations were observed 
on both speech production and perception (Nasir and Ostry, 
2009; Shiller et al., 2009), reflecting an update of sensori-motor 
representations, in response to modifications of the sensory feed- 
backs. Complementory to these studies, our experiment brings 
new arguments supporting the idea that sensorimotor represen- 
tations and internal models that map speech motor commands 
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onto their sensory consequences are continuously updated, not 
only from the comparison between sensory feedbacks and the 
predicted consequences of actions, but also from the compar- 
ison between our own production and external speech inputs 
provided by others. This idea of "comparison" was explored by 
several neurofunctional studies that suggested the existence of an 
"auditory-error module," supposed to be located in the posterior 
STG, more specifically in the planum temporale (Tourville 
et al, 2008). The same authors also proposed the existence 
of a "somatosensory-error module," assumed to be located in 
the SMG and the left anterior Insula, modulated when the 
somatosensory feedback is perturbed (Golfinopoulos et al., 2011). 
Interestingly, these regions are exactly those whose activation was 
modulated between the different imitative tasks of our study, 
and whose activation correlated significantly with the degree of 
imitation. 

This possible involvement of sensorimotor recalibration pro- 
cesses also has implications on the communicative and social 
aspects of phonetic convergence. Imitation may facilitate commu- 
nication not only by improving our likeability or our intelligibility 
for the interlocutor, but also by helping us to better understand 



our interlocutor (his/her feelings, attitudes and speech). Thus, it 
was shown that imitating someone else's accent improves after 
our appreciation of this accent (Adank et al, 2013), or that covert 
imitation facilitates the prediction of upcoming words in sen- 
tences in adverse listening conditions (Adank et al, 2010) and 
to some more limited extent, the recognition of single words 
(Nguyen et al, 2012). 

From these findings, the involvement of sensori-motor recal- 
ibration processes in phonetic convergence, and its potential 
explanation of higher-level communicative and social effects 
(inter-individual differences and phonetic talent, i.e., the abil- 
ity to learn a second language, empathy and likability, intelli- 
gibility enhancement, . . . ) remain to be investigated in future 
studies. 
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